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RESIDUAL  VARIANCE  SCALING  AND  MATRIX  APPROXIMATION 


1.  Philosophical  Orientation 

Every  human  discipline  develops  terminology  and  concepts  peculiar  to 
its  own  needs  and  interests.  Terminology  developed  by  a  discipline  may 
shape  and  direct  but  it  can  also  obscure  the  basic  underlying  concepts 
essential  to  the  development  of  the  discipline.  This  appears  to  be  true 
for  all  human  disciplines,  whether  scientific,  political,  religious, 
esthetic,  or  what  not.  We  must,  of  course,  have  verbal,  auditory,  or  other 
types  of  symbols  to  conmunieate  the  concepts  which  are  developed  within  a 
discipline.  Unfortunately,  after  verbal  symbols  become  established  there 
is  often  a  tendency  to  confuse  them  with  the  fundamental  concepts  of  the 
discipline.  In  much  of  human  communication  the  problem  is  often  one  of 
semantics  rather  than  of  agreement  as  to  what  are  the  essential  concepts  of 
the  discipline. 

The  confusion  between  terminology  and  underlying  concepts  is  not 
restricted  to  the  nonscientiflc  disciplines.  In  the  sciences  ns  well  as 
the  humanities,  semantic  difficulties  are  common.  Particularly  in  the 
sciences  where  ve  like  to  think  that  our  terminology  is  less  ambiguous  than 
in  other  disciplines,  the  problems  of  communication  are  not  confined  to  the 
ambiguity  of  words  alone.  But  even  here  communication  and  consequently  the 
development  of  the  science  can  be  either  impeded  or  facilitated  by  the 
selection  of  a  particular  model  or  set  of  underlying  philosophical  con¬ 
structs  on  the  basis  of  which  we  attempt  to  regularize  observations.  These 
observations  may  be  generated  either  from  events  uncontrolled  by  the 
observer,  such  as  economics,  astronony,  and  so  on,  or  by  systematically 
generated  experience  commonly  known  as  scientific  experimentation. 
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It  is  important  to  recognize  not  only  when  difficulties  of  agreement 
are  due  to  semantic  ambiguities  but  also  when  they  are  due  to  disparities 
among  the  underlying  philosophical  constructs  utilized  either  consciously 
or  unconsciously  by  the  communicators.  The  problems  of  semantics  and 
philosophical  constructs  are  perhaps  nowhere  more  pronounced  in  the  scien¬ 
tific  disciplines  than  in  the  field  of  psychology.  Communication  and  there¬ 
fore  progress  in  psychological  science  can  be  impeded  by  preoccupation  with 
both  semantics  and  philosophical  models  at  the  expense  of  more  basic  issues. 

A  striking  example  of  how  semantic  and  philosophical  ambiguities  can 
cause  confusion  in  a  discipline  is  found  in  an  area  of  psychology  where  the 
techniques  of  mathematical  statistics  have  been  introduced.  We  refer  here 
to  that  general  field  of  activity  which  has  come  to  be  known  as  factor 
analysis.  There  is,  of  course,  disagreement  as  to  what  specific  kinds  of 
activities  should  be  designated  as  factor  analytic.  It  is  perhaps  unfortu¬ 
nate  that  the  techniques  which  have  come  to  be  designated  us  factor  analysis 
have  been  developed  and  utilized  more  extensively  in  psychology  than  in 
other  scientific  disciplines.  One  even  gets  the  impression  that  factor 
analysis  is  regarded  by  some  as  a  branch  of  psychology.  The  work  of  Spearmen 
(1927)  in  the  early  part  of  the  century  contributed  much  to  this  notion  that 
factor  analysis  is  a  branch  of  psychology.  It  is  well  known,  of  course, 
that  his  general  and  specific  factor  theory  of  intelligence  formed  the  basis 
for  the  numerical  and  statistical  techniques  developed  to  demonstrate  hi6 
two-factor  theory.  It  is  also  well  known  that  Thurstone  (19^7)  generalized 
Spearman's  two-factor  theory  by  expanding  the  general  factor  into  a  number 
of  common  factors.  It  is  further  well  knovn  that  Hotelling  (1933),  in  cn 
effort  to  give  mathematical  elegance  to  the  multidimensional  study  of 
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Intelligence,  developed  what  has  come  to  be  known  as  principal  component 
analysis.  The  distinction  between  Hotelling's  principal  component  analysis 
and  Thurstone's  conaaon  factor  analysis  has  been  the  source  of  much  contro¬ 
versy.  Perhaps  most  of  this  controversy  is  based  on  semantic  and  philo¬ 
sophical  preferences  vather  than  on  fundamental  concepts. 

In  any  case,  it  has  been  amply  demonstrated  over  the  past  several 
decades  that  factor  analysis  is  not  a  branch  of  psychology,  but  rather  that 
At  is  a  methodology  applicable  to  all  of  the  sciences.  It  has  not  been  so 
clearly  demonstrated  that  factor  analysis  is  a  general  methodology  of  which 
there  are  many  special  cases.  For  example,  there  are  some  who  would  contend 
that  factor  analysis  is  a  special  case  of  mathematical  statistics.  Perhaps 
the  safest  way  to  avoid  unproductive  semantic  and  philosophical  controversy 
is  to  adhere  a6  closely  as  possible  to  arithmetical  concepts.  It  is  prob¬ 
able  that  if  discussion  in  any  field  of  human  endeavor  which  purports  to  be 
in  any  sense  constructive  were  confined  more  closely  to  arithmetical  and 
numerical  considerations,  controversy  and  ambiguity  could  be  greatly 
reduced.  In  any  case,  while  the  following  discussion  will  be  related  to 
what  has  come  to  be  known  as  factor  analysis  methodology  in  psychology,  we 
shall  attempt  to  adhere  as  closely  as  possible  to  arithmetical  concepts  and 
to  exclude  the  more  abstract  concepts  of  psychology  and  mathematical 
statistics. 

In  confining  our  discussion  primarily  to  arithmetical  considerations, 
we  exclude  also  most  of  mathematics.  The  reason  for  this  excessive 
restriction  is  that  even  in  mathematics,  semantic  and  philosophical  red 
herrings  may  confuse  communication  and  methodology.  It  is  well  known  that 
many  different  mathematical  rationales  may  lead  to  the  same  numerical 
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results.  It  is  probably  in  general  true  that  the  more  ancient  a  discipline 
the  more  it  tends  to  become  encrusted  with  Irrelevant  and  ritualistic 
semantic  and  philosophical  devices.  This  is  true  of  law,  medicine,  religion, 
philosophy,  and  mathematics,  the  last  three  of  which  are  among  the  oldest 
of  formal  human  disciplines.  It  is  hoped  therefore  that  our  presentation 
can  be  maintained  almost  exclusively  on  the  arithmetical  level,  and  that 
even  the  algebra  which  it  is  necessary  to  employ  will  be  merely  shorthand 
notation  for  the  arithmetic  operations  involved.  Even  though  we  shall 
attempt  to  restrict  the  major  part  of  our  discussion  to  numerical  concepts, 
we  shall  nevertheless  relate  the  procedures  to  methods  and  systems  developed 
by  psychologists  and  mathematical  statisticians.  Our  own  notation  and 
terminology  will  follow  closely  that  which  we  have  developed  previously 
(Horst,  1963,  1965)  to  circumvent  some  of  the  more  cumbersome  nomenclature 
of  traditional  mathematics. 

2.  The  Arithmetical  Model 


Suppose  we  have  given  an  M  x  n  basic  data  matrix  X  with  K  >  n  and 


x'l  »  0 

(2.1) 

Consider  an  approximation  matrix  U  of  rank  m  with  m  <  n 

such  that 

X  -  U  =  e 

(2.2) 

where  e  is  a  residual  matrix  and 

u'  e  3  0 

(2.3) 

Let 

G  =  X'  X  /  N 

(2-!0 

G  -  U7  U  /  N 

(2.5) 

G  -  e7  e  /  N 

6 

(2.6) 
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Prom  Eos,  2.2  through  2.6 


Let 


0-0  »  G 
u  e 

(2.7) 

an  n  x  m  matrix  such  that 

VAA' 

(2.8) 

Ds  -  diag  (Ge) 

(2.9) 

Da  -  diag  (gm,) 

(2.10) 

Dg  =  diag  (G) 

(2.11) 

E  •  D  t  D  .  r 

«  «  A 

(2.12) 

From  Eqs.  2.7  through  2.12 

(°G  "  da)  ?  (G  -  A  A')  (Dq  -  Da)-^  -1=5 


Note  that 


De  -  0 


Let 


a  =  D  A 
€ 

.A  .A 

C  =■  D  5G0  2 

e  e 

From  Eqs.  2.13  through  2.  ' 

C  -  B  o'  -  I  s  E 

Let  the  basic  structure  of  c  be  indicated  by 
C  =  Q  6  Q' 

ar;d  let 

“ "  [V  QJ 


6  = 


6  0 
m 

0  (, 


(2.13) 

(2.14) 

(2.15) 
(2.16; 

(2.17) 

(2.18) 

(2.19/ 

(2.20) 
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vher~  e  and  t  are  dimensionality  subscripts  and 

m  +  s  a  n  (2.21) 

Let 

0  =  tr  E2  (2.22) 

V/e  wish  now  to  determine  a  sc  that  0  is  a  minimum.  Equation  2.17  means 
obviously  that  the  matrix  X  has  been  scaled  so  that  the  variance i  of  the 
residual  matrix  are  all  unity.  The  minimization  of  0  in  Eq.  2.22  means  that 
the  sum  of  squared  correlations  for  the  residual  matrix  is  a  minimum. 

It  is  well  known  that  0  will  be  a  minimum  when 

"  *  Sn  (4m  ‘  ^  (2*23) 

hence  for  a  to  be  real,  the  smallest  6  in  6  must  be 

mm 

m 

6  >1  (2.24) 

zn 
a 

From  Eqs.  2.17  through  2.20,  and  2.23,  it  can  readily  be  shown  that 

0  =  tr  (Ss  -  Is)2  (2.25) 

Because  of  Eq.  2.14  we  have 

tr  6^  =  s  (2.26) 

Hence  0  is  simply  s  times  the  variance  of  the  s  smallest  roots  or  basic 
diagonal  elements  of  C  in  Eq.  2.18. 

It  is  of  interest  to  note  that  because  of  Eqs.  2.16  through  2.20,  and 
2.24,  we  may  write 

D  G  D  "5  (D  '2  a)  -  D  A  (I  +  A'  D  '1  A)  (2.27) 

€  G  6  G  6 


or  more  simply 

(c  -  I;  a  -  a  a'  a 


(2.28) 
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Equations  2.2?  and.  2.28  are  mathematically  equivalent  to  those  given  by 
Lavley  (19^*0) ,  Rao  (1955),  and  others,  and  usually  derived  from  much  more 
elaborate  theoretical  constructs.  The  problem  of  determining  A  to  minimize 
0  has  received  much  attention  by  these  and  other  investigators.  All 
methods  proposed  require  iterative  procedures  beginning  with  initial  esti¬ 
mates  of  A  or  functions  of  its  elements.  Three  major  difficulties  have 
been  encountered:  (l)  the  determination  of  suitable  initial  estimates; 

(2)  excessive  computation  time,  even  with  electronic  computers;  (3)  so- 
called  improper  solutions  in  which  some  of  the  elements  of  D^_  may  be 
negative. 

The  methods  refc  red  to  have  been  insistently  designated  "factor  anal¬ 
ysis"  to  distinguish  them  from  what  some  writers  prefer  to  call  principal 
component  analysis.  More  specifically,  they  have  been  variously  called 
maximum  likelihood  factor  analysis,  canonical  factor  analysis,  and  maximum 
determinant  factor  analysis.  We  have  preferred  to  circumvent  the  distinc¬ 
tion  between  factor  analysis  and  principal  component  analysis  and  to  refer 
to  the  algebraic  model  as  a  specificity  scaling  model  (Horst,  1965a) •  It 
will  be  noted  also  that  our  approach  emphasizes  the  scaling  and  decomposi¬ 
tion  of  the  data  matrix  rather  than  of  the  covariance  matrix  of  the  data 
matrix,  although  this  distinction  is  not  germane  to  the  solution. 

3.  Computational  Rationale 

Semantic  and  philosophical  preferences  aside,  a  computational  procedure 
developed  by  Joreskog  (1966)  appears  to  be  the  best  available  to  date  with 
reference  to  the  problems  of  initial  estimates,  computational  speed  and 
accuracy,  and  proper  solutions  for  residual  variances.  His  development 
provides  significance  tests  for  specific  values  of  m.  These  tests  are  based. 


on  the  more  elaborate  philosophical  substructure  of  his  model  vhich  we  do 
not  include  in  our  arithmetical  development. 

We  hove  previously  (Horst,  1965a)  presented  a  computational  solution 
which  is  a  special  case  of  a  more  general  basic  structure  type  solution 
(Horst,  1965b).  The  solution  cited  suffers  both  from  unsatisfactory  speci¬ 
fications  for  the  selection  of  initial  values  and  excessive  computation 
time.  It  appears,  however,  to  restrict  the  residual  variances  to  positive 
values.  The  method  begins  with  a  consideration  of  the  general  Gramian 
matrix  G  and  a  factor  loading  matrix  A  such  that 

G  -  A  A'  =  e  (3.1) 

We  determine  A  in  Eq.  3.1  so  as  to  minimize  tr  e2.  We  indicate  the  basic 
structure  of  G  by 

°  =  Sn  6m  Qm'  +  Qs  6S  V  (3.2) 

where  m  and  s  are  dimensionality  subscripts  which  correspond  to  the  first 
m  and  last  s  latent  roots  and  vectors  of  G. 

It  is  well  known  that  the  solution  for  A  of  width  m  which  minimizes 

2 

tr  e  is 


From  Eqs.  3.1,  3.2,  and  3,3 

1 

A  =  G  A  (A'  G  A)~?  h  (3.4) 

where  h  is  an  arbitrary  square  orthonormal  matrix.  In  particular,  we  may 
indicate  the  triangular  factoring 

tt'^A'GA  (3.5) 

Then  h  in  Eq.  3.4  may  be  such  that 


t'"1  =  (A'  G  a)‘*  h 


(3-6) 


i 


From  Eqs.  3*4  through  3*6 

A  =  0  A  t'_1  (3.7) 

Suppose  we  choose  an  arbitrary  matrix  cJk  of  width  m,  subject  only  to 
the  restriction  that  QA/  is  basic.  We  then  write  the  iteration  equations 

it  it/  =  iA'  G  iA  (3.8) 


i+1A  -  G  ±A  .t 


(3-9) 


It  has  been  chown  (Horst,  1965b)  that  ,+.A  converges  to  S^2  and  therefore 
2 

minimizes  tr  e  as  i  increases  without  limit. 

We  have  used  a  modification  of  this  method  to  solve  for  A  in  Eq.  2.13 


(Horst,  1935).  We  let 

»  -  (D0  -  Bm,) 


,  _1  _h  _!  _J, 

t  t  -  A'  D  2  (D  2  G  D  2  -  I)  D  2  A 


(3.10) 

(3.11) 


d"2  a  =  (d‘2  g  d"2  -  I)  d”2  a  t'"1 


(3.12) 


From  Eq.  3*H 


tt'  =  A  D-1  G  D-1  A  -  A'  D_1  A 


(3-13) 


From  Eq.  3*12 


A  =  (G  D"1  A  -  A)  t'_1 


(3.14) 


We  may  new  let 


(J  -  D  A 


W  =  G  U  -  A 


(3-15) 

(3.16) 


Then  from  Eqs.  3.13  through  3*16 


t  t'  =  u'  W 


(3.17! 


[if  -  [uwl 


(3-18) 
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Thus  the  partial  triangular  factoring  of  the  supermatrix  on  the  right  of 
F.q.  3.18  yields  the  factor  loading  matrix  A  as  the  lower  submatrix  on  the 
left.  This  leads  to  the  iteration  equations 

iD  ■  <bg  -  Va'>  (3‘19) 

i  1 

iU  =  iD_1iA  (3-20) 


W  =  0  iU  •  iA 


(3*21) 


(3-22) 


Equations  3. 19  through  3*22  constitute  in  slightly  different  form  and 
notation  those  we  have  previously  given  for  the  specificity  scaled  factor 
analysis  solution  (Horst,  1965c).  We  originally  suggested  that  QA  be  taken 
as  the  principal  axis  factor  matrix  for  m  factors  of  the  correlation  matrix 
corresponding  to  G.  As  is  well  known,  the  .qx.  jificity  scaled  solution  is 
independent  of  scale  for  the  original  variables  and  hence  the  correlation 
matrix  R  may,  without  loss  of  generality,  be  taken  as  G,  an  arbitrarily 
scaled  covariance  matrix.  When  the  principal  axis  solution  is  taken  for  ^A, 
it  is  obvious  from  Eq.  2.23  that  the  number  of  assumed  factors  cannot  exceed 
the  number  of  roots  of  R  greater  than  unity.  This  restriction  is  consonant 
with  the  recommendations  of  Kaiser  (i960)  and  others  for  an  upper  bound  to 
the  number  of  factors. 

Let  us  now  return  to  Eq.  2.15.  From  this  it  can  be  shown  that 

[Do  +  ”«']  ■  [Do  -  V]-1 


(3.23) 


% 
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Let 

4  =  dq  +  D«/  (3.2M 

From  Eqs.  3.23,  3*24,  and  3-12 

t  t'  =  a'  G  o'  -  o'  a  (3*25) 

and 

a  *  (A^  (G  -  I)  -  I)  or  t'"1  (3*26) 

From  Eqs.  2.15  and  3«23 

A  *  (DG  +  Daer'J"^  ®  (3-27) 

The  iterative  solution  indicated  by  Eqs.  3*25  and  3*26  shows  that  because 
of  Eq.  3*24  no  iteration  can  yield  a  negative  A,  or  because  of  Eq.  3*23,  a 
negative  residual  variance. 

4.  Initial  Estimates 

However,  the  method  previously  outlined  (Horst,  1965c)  does  suffer  from 
several  weaknesses.  First,  the  principal  axis  approximation  for  the  QA 
matrix  as  determined  from  the  correlation  matrix  does  not  appear  to  be 
satisfactory.  Second,  the  iterations  converge  slowly.  Third,  there  is  not 
adequate  assurance  that  the  convergence  is  to  an  absolute  rather  than  a  local 
minimum. 

To  overcome  the  first  objection  we  take  a  cue  from  the  image  analysis 
model  of  Guttmen  (1953).  We  consider  the  residual  matrix  obtained  when  each 
variable  is  estimated  by  conventional  least  square  procedures  from  all  the 
others.  The  covariance  matrix  of  this  residual  matrix  is  well  known  to  be 
given  by 


and  has  been  called  by  Guttman  (1953)  the  anti-image  matrix. 
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It  Is  clear  therefore  that  if  a  covariance  matrix  is  scaled  by  the  square 
roots  of  the  dingc nals  of  its  inverse,  the  anti-image  matrix  of  the  re¬ 
scaled  covariance  matrix  will  have  unity  in  the  diagonals. 

We  begin  now  by  rescaling  the  matrix  G  as  indicated  in  Eq.  4.5,  and 
let  the  basic  structure  of  C  be 


(4.6) 


We  let 

<f  •  (Sm  •  Iji  c*-7' 

oA  -  <r  *  Vo"''"*  oa  <k-8> 

When  qA  is  used  for  i  =  0  in  Eqs,  3,19  through  3-22,  the  value  <f>  for 
successive  iterations  drops  much  more  rapidly  than  when  the  approximation 
qA  is  based  on  the  largest  latent  roots  and  associated  vectors  of  the 
correlation  matrix.  For  data  from  Hemmerle  (1965),  rc -analyzed  by  Joreskog 
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(1966),  It  was  also  found  that  with  a  sufficient  number  of  iterations  the 
value  of  his  criterion  and  f  (our  D)  values  were  closely  approximated.  For 
this  example  it  appears  therefore  that  the  absolute  minimum  rather  than  a 
local  minimum  was  reached.  Furthermore,  no  problems  of  negative  residual 
variances  were  encountered  although  several  variables  which  Joreskog  (1966) 
found  to  have  *  values  on  the  boundary  appeared  small,  as  will  be  subse¬ 
quently  indicated. 

5 •  Iterative  Procedures 

However,  the  number  of  iterations  required  to  achieve  Joreskog’ s 
solution  for  Hemmerle's  data  was  10, 000,  and  required  about  21  minutes  on 
the  IBM  7094 -MOD  1.  It  was  noted,  however,  that  after  about  20  iterations 
a  definite  drift  appeared  to  establish  itself  so  that  the  vectors  of 
differences  between  successive  D  vectors  decreased  slowly.  The  iteration 
procedure  was  therefore  modified  to  take  advantage  of  this  regularity  as 
follows: 

Let 

be  a  specified  number  of  iterations 
Kg  be  a  specified  number  of  sets  of  iterations 
E^  be  a  parameter  to  be  empirically  determined 
Eg  be  the  minimum  value  allowed  for  any  element  of  in  Eq.  3. 

For  any  iteration  i  we  may  calculate  the  criterion 

±0  =  tr  ^D"2  (G  -  ±A  iA/)  jD"2  -  I)2  (5.1) 

However,  this  criterion  need  be  calculated  only  at  prespecified  intervals. 

We  proceed  as  follows: 

iterations  are  computed  of  the  type  3.19  through  3.22  for  the  set  of 
Kg  iterations.  We  calculate 


We  assume  now  that 


A  =  r  A  +  a  U  (5.3) 

h 

where  a  is  some  positive  scaler  quantity.  In  particular,  we  let 

a  -  K2  (5**0 

where  Kg  is  the  serial  order  of  the  set.  From  Eq.  5*3  we  calculate 

D  -  Cl  -  (5.5) 

If  no  element  of  Eq.  5*5  is  less  than  Eg,  we  take  A  as  given  in  Eq. 

5*3  and  continue  with  the  next  set  of  iterations.  Otherwise  we  take  A  as 

and  reduce  Kg  to 

Kg  *  Kg  /  nc  (5.6) 

where  nc  is  a  positive  number  empirically  determined. 

We  continue  in  this  manner  so  that  for  each  set  of  iterations  we 
calculate  Eq.  5*2  from  the  last  two  iteration  cycles  of  the  set  and  Eq. 

5.1  from  the  last  iteration  cycle.  The  value  Kg  in  Eq.  5*3  increases  by  1 
for  each  set  of  iterations,  and  the  beginning  A  for  the  next  set  of 
iterations  is  given  by  Eq.  5.3  unless  a  in  Eq.  5*5  is  less  than  Eg.  In 

this  case,  Kg  is  first  reduced  by  Eq.  5*6,  and  the  beginning  A  for  the  next 

set  of  iterations  is  taken  as  the  last  A  from  the  previous  set. 

Presumably  the  success  of  the  method  depends  on  the  choice  of  the  con¬ 
stants  K^,  Kg,  E^,  and  nc-  For  seven  sets  of  data  of  widely  differing 
characteristics,  good  results  were  obtained  with  =*  10,  Kg  -  10,  =*  10, 
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and  nc  «  3.  Five  of  these  seven  sets  have  been  analyzed  by  Joreskog  (1966) 
but  his  0  value  is  given  for  only  one  of  these.  Joreskog  gives  results 
based  on  a  number  of  different  o.ssumed  numbers  of  factors  for  each  set  of 
data.  Since  his  method  is  presumably  at  least  as  accurate  as  ours  and  yields 
in  addition  tests  of  significance  for  any  assumed  number  of  factors,  the 
only  advantage  ours  may  have  is  length  of  time  required. 

In  our  method  ve  give  only  upper  and  lower  bounds  for  the  number  of 
factors  and  these  are  highly  tentative.  If  we  let 

a  =  D^_,  R  £  (5.7) 

R  X  R 


Q  J  Q'  +  _Q  6  q' 
m  m  m  s  s  s 


(5-8) 


then  the  largest  value  of  tn  will  be  such  that 


6 

m  m 


>  1 


(5.9) 


and  the  smallest  value  such  that 


6  +  6 
mm  s  s 


>  2 


(5.10) 


In  addition  we  specify  that 

m  <  n/2  (5.11) 

It  should  be  noted  that  for  the  method  outlined  it  is  quite  possible 
for  a  ^0  value  to  be  greeter  than  for  ^  ^0.  This  can  occur  after  an 
acceleration  indicated  by  Eq.  5-3*  If  the  value  of  a  is  kept  sufficiently 
small  it  will  not  occur,  but  then  the  rate  of  convergence  may  be  unaccept¬ 
ably  slow.  Our  procedure  provides  for  grouping  of  the  successively 
calculated  0  values  into  sets  of  each.  In  particular  we  may  have  -  1 , 
If  the  lowest  0  value  in  set  i  is  lower  than  the  lowest  0  in  set  i  +  1,  the 
routine  described  is  terminated  and  the  A  matrix  corresponding  to  the  lowest 
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0  value  Is  taken  as  the  starting  point  for  a  final  set  of  Iterations  without 
acceleration.  This  is  a  sort  of  polishing  operation  and  it  appears  that  25 
Iterations  is  adequate  for  the  data  we  have  analyzed.  If  no  reversals  in 
0  values  are  encountered,  the  routine  method  continues  for  some  prespecified 
number  of  sets,  after  which  the  polishing  iterations  occur. 

6.  Numerical  Results 

Results  for  the  seven  sets  of  data  we  have  analyzed  are  given  in 

Table  1.  Each  column  of  the  table  represents  a  set  of  data.  The  rows  are 

as  follows: 

Row  1  gives  the  number  of  variables  in  the  set. 

Row  2  gives  the  source  from  which  the  data  were  taken. 

Row  3  gives  the  number  assigned  to  the  set  of  data  by  Joreskog. 

Row  4a  gives  the  smaller  number  of  factors  solved  for. 

Row  4b  gives  the  number  of  factors  solved  for  by  Joreskog  which 
corresponds  most  closely  to  our  smaller  number. 

Row  4c  gives  the  larger  number  of  factors  solved  for. 

Row  4d  gives  the  number  of  factors  solved  for  by  Joreskog  which 
corresponds  most  closely  to  our  larger  number. 

Row  5a  gives  the  0/2  values  or  half  the  sum  of  squared  residuals  for  the 
smaller  number  of  factors  as  determined  after  400  final  polishing  iterations 
and  therefore  assumed  to  be  very  close  to  the  minimum  value. 

Row  5b  gives  the  0/2  values  after  25  polishing  iterations  for  the 
smaller  number  of  factors  with  K^,  K ,,,  and  all  equal  to  10. 

Row  5c  is  the  same  as  row  5a  except  that  the  0/2  values  are  for  the 
higher  number  of  factors. 

Row  5d  is  the  tc —  as  row  5b  except  for  the  higher  number  of  factors. 
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Rov  6a  gives  the  time  in  seconds  for  the  accelerated  and  25  polishing 
iterations  for  the  lover  number  of  factors.  It  does  not  include  the  com¬ 
putation  time  for  the  initial  estimate  of  QA  nor  for  input.  Perhaps  30  per 
cent  to  50  per  cent  additional  tine  is  required  for  the  initial  estimate  of 

(/* 

Rov  6b  gives  Joreskog's  time  on  the  CDC  3600  for  the  nearest  correspond¬ 
ing  number  of  factors  to  those  in  6a  but  does  not  include  input  and  output 

time. 

Rov  6c  is  the  same  as  rov  6a  for  the  higher  number  of  factors. 

Rov  6d  is  the  same  as  rov  6b  for  the  nearest  corresponding  number  of 
factors  to  those  in  6c. 

It  is  difficult  without  actually  running  Joreskog's  program  on  the  IBM 
7094  to  compare  our  time  with  his.  If  we  take  his  estimate  that  the  CDC 
36OO  is  about  two  and  a  half  times  as  fast  as  the  I3M  7094,  it  appears  that 
for  a  maximum  of  ten  sets  of  accelerating  iterations  with  ten  iterations  to 
a  set,  our  method  is  from  three  to  five  times  more  rapid  than  Joreskog's 
and  from  99  to  100  per  cent  as  accurate,  depending  on  the  particular  set  of 
data  and  the  number  of  factors  solved  for.  However,  ve  have  run  our  program 
also  on  the  CDC  3600.  Our  results  indicate  that  the  CDC  3600  is  at  best 
only  10  per  cent  faster  than  the  IBM  7094.  If  this  is  correct,  then  our 
method  is  at  best  only  25  per  cent  to  100  per  cent  faster. 

Our  method  does  not  give  the  levil  of  significance  at  which  a  specified 
number  of  factors  satisfies  the  so-called  factor  analysis  model  as  does 
Joreskog's  method.  If  desired,  his  tests  could  be  added  to  our  program.  In 
this  case  one  would  probably  begin  with  our  lover  bound  for  the  number  of 
factors  and  proceed  first  downward  and  then  upward  with  one  less  and  one 
additional  factor  at  a  time. 
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It  is  interesting  to  note  that  with  Data  3  for  8  factors,  the  0/2  value 
of  .05806  is  reached  after  8  accelerated  sets  of  10  iterations  each  and  20 
polishing  iterations,  or  a  total  of  100  iterations,  while  this  criterion  is 
attained  only  after  6,000  nonaccelerated  iterations.  Table  2  gives  to  three 
decimal  places  for  Data  3  the  residual  variances  scaled  back  to  unit  variance 
for  the  observed  covariance  matrix  for  a  number  of  different  cases.  The 
corresponding  0/2  values  are  given  in  the  last  row.  Column  1  gives  our 
values  for  80  accelerated  and  25  polishing  iterations.  Column  £  gives  our 
values  for  6,000  unaccelerated  iterations.  The  0/ 2  values  for  these  two 
columns  are  the  same.  Column  3  gives  our  values  for  100  unaccelerated 
iterations.  The  0/2  value  is  almost  14  per  cent  greater  than  for  the  same 
number  of  accelerated  iterations  of  column  1.  Column  4  gives  our  values  for 
10,000  unaccelerated  iterations.  Column  5  gives  Joreskog's  values.  The 
disparity  among  all  columns  except  column  3  Is  doubtless  far  less  than  the 
accuracy  of  the  data  would  require.  Nevertheless  the  Joreskog  method  gives 
the  lowest  0/2  value,  .05787.  This  value  was  calculated  by  using  the 
specificity  variances  which  he  calculated  to  throe  decimal  places, 

Joreskog's  published  value  for  0  is  .1134  so  that  his  0/2  is  .0567.  We 
cannot  account  for  the  discrepancy  between  this  value  and  our  value  of  .0579 
calculated  from  his  unique  variances.  It  is  perhaps  possible  that  greeter 
decimal  accuracy  for  the  unique  variances  would  have  given  his  0  value  but 
only  three-place  accuracy  was  available  to  us. 

The  ratio  of  our  residual  sum  of  squares  to  that  of  Joreskog  is  1.CQ4 
and,  using  2.5  as  the  ratio  of  IBM  7094  MOD  1  to  CDC  3600  time,  was  obtained 
in  less  than  one-fifth  the  time.  One  reason  for  the  rapidity  of  our  method 
is  that  an  iteration  cycle  indicated  by  equations  47  through  51  is  many 
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times  faster  for  a  small  number  of  factors  than  a  basic  structure  solution 
for  the  full  covariance  matrix.  The  time  of  the  HM  70'9b  MOD  1  for  a 
15  x  I?  matrix  with  8  factors  Is  less  than  .12  seconds  for  one  of  our  iter¬ 
ations,  whereas  for  the  basic  structure  solution  it  is  about  20  to  30  times 
as  long.  Each  Joreskog  iteration  requires  a  basic  structure  solution. 

But  even  though  our  results  for  Data  3  with  8  factors  is  for  all  prac¬ 
tical  purpose  as  good  as  those  of  Joreskog  and  much  faster,  the  superiority 
of  the  method  for  other  numbers  of  factors  for  Data  3  and  for  all  of  the 
remaining  sets  of  data  has  been  demonstrated  only  for  speed  and  not  for 
accuracy*  Our  minimum  0's  indicated  in  Table  1  are  probably  quite  accurate 
for  the  initial  qA  matrices  on  which  they  are  based.  Whether,  however,  these 
lead  to  an  absolute  as  veil  as  a  local  minimum  we  have  not  proved  empirically 
or  theoretically.  The  application  of  Joreskog' s  method  for  the  other  data 
would  doubtless  indicate  whether  we  are  close  to  an  absolute  minimum  for 
positive  unique  variances.  This  would  not,  however,  prove  that  our  method 
for  selecting  the  initial  QA  converges  in  general  to  an  absolute  minimum. 

That  the  solution  is  restricted  to  positive  residu-ul  variances  we  have 
already  shown. 

E’en  though  the  iteration  cycles  for  the  method  we  have  outlined  are 
very  rapid,  columns  1  and  2  of  Table  2  indicate  that  it  is  primarily  the 
acceleration  feature  which  is  responsible  for  the  speed  of  the  method.  This 
feature  increases  the  speed  of  the  method  by  a  factor  of  about  60  for  Data  3 
with  the  acceleration  parameters  used.  The  question  may  well  be  raised 
whether  other  acceleration  parameters,  or  indeed  other  acceleration  stratcgi- 
may  increase  the  rate  of  convergence  appreciably.  To  date  we  can  only  say 
that  we  have  experimented  with  many  different  combinations  of  values  or 
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iteration  parameters  and  vlth  other  methods  of  determining  the  augmentation 
parameter  a  throughout  the  successive  iterations.  To  date  ve  have  found  no 
acceleration  procedure  which  is  clearly  as  good  or  better,  from  the  point  of 
view  of  speed  and  accuracy,  than  the  values  fG  -  10,  =  10,  E^  -  10, 

nc  =  3* 

It  is  important  in  closing  to  emphasize  obvious  limitations  of  the 
method  we  have  outlined, 

(1)  We  have  not  proved--and  it  may  well  not  be  true--that  in  general  our 
method  for  determining  QA  leads  to  an  absolute  rather  than  a  local  minimum 
sum  of  squared  residuals. 

(2)  We  have  not  provided  a  method  for  determining  the  number  of  factors 
although  Joreskog's  procedure  for  doing  this  might  be  incorporated  into  ours. 

(3)  We  have  by  no  means  exhausted  all  possibilities  for  appreciably 
improving  the  acceleration  strategy. 

(4)  Ue  do  not  know  how  well  the  acceleration  strategy  and  parameters 
would  work  on  Gramian  matrices  in  general. 
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TABLE  2 

Unique  Variances  for  Data  3 


12  3  4  5 


100 
Acc . 

6,000 

Un.  A. 

100 

Un.  A. 

10,000 
Un.  A. 

Joreskog 

1 

.263 

.262 

.238 

.262 

.263 

2 

•  392 

•  395 

.366 

.395 

.395 

3 

.458 

.457 

.451 

.457 

.458 

4 

.090 

.086 

.238 

.O83 

.080 

5 

.489 

.485 

.467 

.486 

.487 

6 

.259 

.260 

.280 

.259 

.259 

7 

.014 

.010 

.173 

.006 

.005 

8 

.466 

.466 

.450 

.465 

.465 

9 

.662 

.663 

.618 

.663 

,664 

10 

.010 

.042 

.143 

.037 

.029 

11 

.580 

■  579 

.566 

.579 

.580 

12 

.504 

.497 

.515 

.498 

.499 

13 

.738 

.736 

.721 

.737 

.738 

l4 

.620 

.620 

.586 

.621 

.622 

15 

.010 

.oi4 

.143 

.009 

.005 

0/2 

.05806 

.05806 

.06586 

.05796 

.05787 
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